In the era of big data, it is becoming common to have data with multiplemodalities or coming from multiple sources, known as "multi-view data".Multi-view data are usually unlabeled and come from high-dimensional spaces(such as language vocabularies), unsupervised multi-view feature selection iscrucial to many applications. However, it is nontrivial due to the followingchallenges. First, there are too many instances or the feature dimensionalityis too large. Thus, the data may not fit in memory. How to select usefulfeatures with limited memory space? Second, how to select features fromstreaming data and handles the concept drift? Third, how to leverage theconsistent and complementary information from different views to improve thefeature selection in the situation when the data are too big or come in asstreams? To the best of our knowledge, none of the previous works can solve allthe challenges simultaneously. In this paper, we propose an Online unsupervisedMulti-View Feature Selection, OMVFS, which deals with large-scale/streamingmulti-view data in an online fashion. OMVFS embeds unsupervised featureselection into a clustering algorithm via NMF with sparse learning. It furtherincorporates the graph regularization to preserve the local structureinformation and help select discriminative features. Instead of storing all thehistorical data, OMVFS processes the multi-view data chunk by chunk andaggregates all the necessary information into several small matrices. By usingthe buffering technique, the proposed OMVFS can reduce the computational andstorage cost while taking advantage of the structure information. Furthermore,OMVFS can capture the concept drifts in the data streams. Extensive experimentson four real-world datasets show the effectiveness and efficiency of theproposed OMVFS method. More importantly, OMVFS is about 100 times faster thanthe off-line methods.
展开▼